Skip to content

Conversation

@madhu-reddy-peram
Copy link

@madhu-reddy-peram madhu-reddy-peram commented Nov 14, 2025

Description:
This PR improves dashboard performance and prevents query timeouts by scoping kube_pod_info time series queries to the selected cluster (via clusterLabel), instead of scanning all 150 Kubernetes clusters.

Problem
Currently, dashboard queries using kube_pod_info fetch pod metadata across all clusters in our single large Mimir (Prometheus) instance, regardless of the configured clusterLabel. This leads to:

Timeouts on short time ranges (e.g., 5 minutes): ~136,684 time series fetched. Look at below screenshot.

image

Backend failures on longer ranges (e.g., 2 hours): exceeds maximum series limit and it causes prometheus backend to go slow often.

Solution
Align kube_pod_info filtering with other network metrics by restricting queries to the cluster specified in the clusterLabel dashboard variable.

Impact

  • Drastically reduces series count and query latency
  • Eliminates timeouts and backend errors
  • Maintains accuracy for single-cluster dashboards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant